Part 1: Automobile

Import and warehouse data

Data cleansing, Data analysis and visualisation

Machine learning

K-Means

Hierarchical clustering

Regression with original data

Regression with Heirarchical clusters

Regression with Kmeans clusters

5&6. The regression models has been generated with original data, Hierarchial clusters and Kmeans clusters. The four regression models used in the project are namely Bayesian Ridge Regressor, Support Vector Regressor, Bagging Regressor and Catboost Regressor.

"Kmeans clusters with Cat boost regressor has highest R2 score of 88.3%"

The inference from data anaylsis plots show that Miles per gallon has improved over the years due to constant innovation in car industry or construction of new roads in the city (which might have reduced the traffic).

To improve the dataset it would be beneficial to given details on fuel price data over the years and average kilometers driven.

Part2 : Wine Quality

After applying the GaussianCopula model the data accuracy has increased. The accuracy scores has been generated using logistic regression.

Part3 : Automobile : PCA

Data

EDA and visualisation

Dimensional reduction and Classifier

Both the models gave above 90% accuracy. The SVM accuracy with PCA data is slightly low despite of excluding huge chucks of data points.

Part 4 - Sports Management

EDA and visualisation

Data Driven Model

Part 5 : Dimensionality reduction Techniques

Dimensionality reduction techniques that can be implemented using python 1 Missing Value Ratio 2 Low Variance Filter 3 High Correlation Filter 4 Random Forest 5 Backward Feature Elimination 6 Forward Feature Selection 7 Factor Analysis 8 Principal Component Analysis 9 Independent Component Analysis 10 Methods Based on Projections 11 t-Distributed Stochastic Neighbor Embedding (t-SNE) 12 UMAP 13 Autoencoder

Dimensional reduction on image data using simplest possible autoencoder